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(54) MultNanguage domain name sevice 

(57) A multilingual Domain Name System allows us- 
ers to use Domain Names in non-Unicode or ASCII en- 
codings. An international DNS server (or iDNS server) 
receives multilingual DNS requests and converts them 
to a format that can be used in the conventional Domain 
Name System. When the iDNS server first receives a 
DNS request, it determines the encoding type of that re- 
quest. It may do this by considering the bit string in the 
top-level domain (or other portion) of the Domain Name 
and matching that string against a list of-, known bit 



strings for known top-level domains of various encoding 
types. One entry in the list may be the bit string for 
com 1 in Chinese BIGS, for example. Afterthe DNS serv- 
er identifies the encoding type of the Domain Name, it 
converts the encoding of the Domain Name to Unicode, 
it then translates the Unicode representation to an AS- 
CII representation conforming to the universal DNS 
standard. This is then passed into a conventional Do- 
main Name System, which recognizes the ASCII format 
Domain Name and returns the associated IP address. 
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Description 

Background of the Invention 

[0001] The present invention relates to the Domain 
Name Service used to resolve network domain names 
into corresponding network addresses. More particular- 
ly, the invention relates to an alternative or modified Do- 
main Name Service that accepts domain names provid- 
ed in many different encoding formats, not just ASCII. 
[0002] The Internet has evolved from a purely re- 
search and academic entity to a global network that 
reaches a diverse community with different languages 
and cultures. In all areas the Internet has progressed to 
address the localization needs of its audience. Today, 
electronic mail is exchanged in most languages. Con- 
tent on the World Wide Web is now published in many 
different languages as multilingual-enabled software 
applications proliferate. It is possible to send an e-mail 
message to another person in Chinese or to view a 
Work! Wide Web page in Japanese. 
[0003] The Internet today relies entirely on the Do- 
main Name System to resolve human readable names 
to numeric IP addresses and vice versa. The Domain 
Name System (DNS) is still based on a subset of Latin- 
1 alphabet, thus still mainly English. To provide univer- 
sality, e-mail addresses, Web addresses, and other In- 
ternet addressing formats adopt ASCII as the global 
standard to guarantee interoperatkxi. No provision is 
made to allow for e-mail or Web addresses to be in a 
non-ASCII native language. The implication is that any 
user of the Internet has to have some basic knowledge 
of ASCII characters. 

[0004] While this does not pose a problem to technical 
or business users who, generally speaking, are able to 
understand English as an international language of sci- 
ence, technology, business and politics, it is a stumbling 
blocktothe rapid proliferation of the Internet to countries 
where English is not widely spoken. In those countries, 
the Internet neophyte must understand basic English as 
a prerequisite to send e-mail in her own native language 
because the e-mail address cannot support the native 
language even though the e-mail application can. Cor- 
porate intranets have to use ASCII to name their depart- 
HTientHjorraif^afTtes-ar^^ 
cause the protocols do not support anything other ASCII 
in the domain name field even though filenames and di- 
rectory paths can be multilingual in the native locale. 
[0005] Moreover, users of European languages have 
to approximate their domain names without accents and 
so on. A company like Citroen wishing to have a corpo- 
rate identity has to approximate itself to the closest AS- 
CII equivalent and use "www.c itroen.fr " and Mr Francois 
from France has to constantly bear the irritation of de- 
liberately mis-typing his e-mail address as 
"francois@Qmail.fr * (as a fictitious example). 
[0006] Currently, user-ids in an e-mail address field 
can be in multilingual scripts as operating systems can 



2 

be localized to provide fonts in the relevant locale. Di- 
rectories and filenames too can also be rendered in mul- 
tilingual scripts. However, the domain name portion of 
these names are restricted to those permitted by the In- 
5 ternet standard in RFC 1 035, the standard setting forth 
the Domain Name System. 

[0007] One justifiable reason for this situation could 
be that software developers tended to use overlapping 
codes. For example, the Chinese BIGS and GB231 2 en- 

w codings (i.e., digital representations of glyphs or char- 
acters) overlap, so do the Japanese JIS and Shift-JIS 
and the Korean KSC5601 , just to name a few. As a re- 
sult, one cannot easily tell the difference between en- 
codings of BIGS with JIS or GB2312 with KSC5601 un- 

1$ less an additional parameter specifying me encoding is 
included to inform the application client which encoding 
is being used. Therefore to ensure uniqueness of do- 
main names and certainty of encoding, DNS has stuck 
to ASCII. 

20 [0008] Based on RFC 1035, valid domain names are 
currently restricted to a subset of the 1SOB859 Latin t 
alphabet, which comprises the alphabet letters A-Z 
(case insensitive), numbers 0-9 and the hyphenation 
symbol (-) only. This restriction effectively makes a do- 

25 rnain name support English or languages with a rornan- 
ized form, such as Malay or Romaji in Japanese, or a 
roman transliteration, such as transliterated Tamil. No 
other script is acceptable; even the extended ASCII 
characters cannot be used. 

30 [0009] Unicode is a character encoding system in 
which nearly every character of most important languag- 
es is uniquely mapped to a 16 bit value. Sjnce Unicode 
has laid down the foundations for unique non-overlap- 
ping encoding system, some researchers have begun 

35 to explore how Unicode can be used as the basis for a 
future DNS namespace, which can embrace the rich di- 
versity of languages present in the world today. See M. 
Durst, "Internationalization of Domain Names," Internet 
Draft "draft-duerst-dns-i18n-02.txt - which can be found 

40 at the IETF home page, http://www. ietf .cnri. reston va, 
us/lD.htmL , July 1998. This document is incorporated 
herein by reference in its entirety and for all purposes. 
The new namespace should be able to offer multilingual 
and multiscript functionality that will make it easier for 

-45 non-English speakers to use the Intejpel 

[0010] Adopting Unicode as the standard character 
set for a new Domain Name System avoids overlapping 
code space for different language scripts. In this way, it 
may allow the Internet community to use domain names 

so in their native scripts such as : 

www.citroen.ch 
www, geneve-city.ch 

55 

[0011] Unfortunately, several difficulties would pre- 
clude modifying the DNS server and client applications 
to implement a multilingual Domain Name System. For 
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example, all future client applications and all future DNS 
servers have to be modified. As both client and server 
have to be modified for the system to work, the transition 
from the old system to the new system could be difficult. 
Further, very few available client applications use native 
Unicode. Instead, most multilingual client applications 
use non-Unicode encodings, and have strong follow* 
ings. 

[0012] In view of these and other issues, it would be 
highly desirable to have a technique allowing the many 
linguistic encodings to be used in the DNS system. 

Summary of the Invention 

[0013] The present invention provides systems and 
methods for implementing a multilingual Domain Name 
System allowing users to use Domain Names in non- 
Unicode and non-ASCII encodings. While the method 
may be implemented in various systems or combination 
of systems, for now the implementing system will be re- 
ferred to as an international DNS server (or '(DNS' serv- 
er). When the iDNS server first receives a DNS request, 
it determines the encoding type of that request. It may 
do this by considering the bit string in the top-level do- 
main of the Domain Name and matching that string 
aga&nst a list of known bit strings for known top-level do- 
mains of various encoding types. One entry in the list 
may be the bit string for •.com" in Chinese BIGS, for ex- 
ample. After the iDNS server identifies the encoding 
type of the Domain Name, it converts the encoding of 
the Domain Name to a universal linguistic encoding type 
(e.g., Unicode). I; then translates the universal linguistic 
encoding type representation to an ASCII representa- 
tion conforming to the universal DNS standard. This is 
then passed frito a conventional Domain Name System, 
which recognizes the ASCII format Domain Name and 
returns the associated IP address. 
[0014] One aspect of the invention provides a method 
of detecting the linguistic encoding type of a digitally rep- 
resented domain name. The method may be character- 
ized by the following sequence: (a) receiving the digital 
sequence of a prespecified portion (e.g., a top-level do- 
main) of the digitally represented domain name; (b) 
matching the digital sequence from the domain name 
with a known digital sequence from a collection of known 
digital sequences; and (c) identifying an encoding type 
associated with the known digital sequence matching 
the digital sequence from the domain name. Each of the 
known digital sequences used in (b) Is associated with 
a particular linguistic encoding type. Note that the col- 
lection of known digital sequences includes known dig- 
ital sequences for at least two different linguistic encod- 
ing types, 

[001 5] It will often be convenient to provide the collec- 
tion in a table containing records having attributes in- t 
eluding known digital sequences and encoding types. In 
this case, identifying the encoding type requires identi- 
fying the encoding type of a record having the matching 
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known digital sequence. Examples of encoding types 
represented in the table include ASCII, BIG5, GB2312, 
shift-JIS, EUC-JP, KSC5S01, and extended ASCII. 
[0016] When at least two known digital sequences 
5 match the digital sequence from the domain name, it will 
be necessary to resolve the ambiguity. This may be ac- 
complished by (a) receiving the digital sequence of a 
second portion of the digitally represented domain 
name; (b) decoding the digital sequence of the second 
10 , portion multiple times, each time using a decoding 
scheme of a different one of the linguistic encoding 
types, each associated with the at least two known dig- 
ital sequences; and (c) identifying the decoding that 
gives the best result. Alternatively, the ambiguity may 
15 be resolved by first matching an extended digital se- 
quence (including both the first and second portions of 
the domain name) and then matching that extended se- 
quence against known digital sequences that may cor- 
respondtothe extended sequence. In this case, the col- 
20 lection of known digita] sequences must include some 
of the extended sequences. 

[0017] In a specific embodiment, the collection of 
records include a digital sequence (or representation of 
a digital sequence) of a 'minimum code resolving string' 
& (MCRS). This is a digital sequence for a portion of a do- 
main name and is known to distinguish that domain 
name - in a particular encoding type - from every other 
domain name/encoding type combination in the collec- 
tion. The MCRS may be a sub-string of the top-level do- 
so main, a super-string of th e top-leve f domain, overflow to 
the second and third level domains, etc., so long as am- 
biguity is avoided when matching takes place, 
{001 aj As mentioned, the method is particularly appli- 
cable to handling DNS requests. Thus, the method may 
35 also involve (i) receiving a DNS request containing the 
digitally represented domain name; (ii) identifying a root 
level DNS server responsible for resolving root level do- 
mains of the identified encoding type; and (iii) transmit- 
ting the DNS request to the root level DNS server. Prior 
40 to transmitting the DNS request, the system should con- 
vert the domain name's digital sequence from the iden- 
tified encoding type to a DNS encoding type compatible 
with DNS protocol (e.g., ASCII* or possibly Unicode or 
some other universal encoding in the future). In a pre- 
*5 ferred embodiment, this conversion takes place in two 
operations: (i) converting the domain names digital se- 
quence from the identified encoding type to a universal 
linguistic encoding type; and (ii) converting the domain 
name's digital sequence from the universal linguistic en- 
>o coding type to a DNS encoding type compatible with the 
DNS protocol. 

[0019] This invention also provides a mapping table 
that associates particular linguistic encoding types with 
particular digital sequences. The mapping table in- 
s eludes a plurality of records, each including the following 
attributes: (a) a known digital sequence of a prespecified 
portion of a digitally represented domain name; and (b) 
a linguistic encoding type associated with the known 
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digital sequence. The prespecified portion of the digitally 
represented domain name may be the digital sequence 
of the root level domain in the domain name. The 
records may also include a top-level level DNS server 
responsible for resolving top-level level domains of the 
linguistic encoding type in the record. Still further, the 
mapping table may specify the type of transformation 
required to convert domain names from a non-DNS en- 
coding type to a DNS compliant encoding type (e.g., 
UTF-5). 

[0020] This invention also relates toan apparatus that 
may be characterized by the following features: (a) one 
or more processors; (b) memory coupled to at least one 
of the one or more processors; and (c) one or more net- 
work interfaces capable of receiving a first DNS request 
including a domain name in a non-DNS encoding type 
and transmitting a DNS request with the domain name 
in a DNS encoding type that is compatible with the DNS 
protocol. At least one of the one or more processors will 
be designed or configured to convert the domain name 
in the non-DNS encoding type to that domain name in 
the DNS encoding type. The one or more network inter- 
faces should be coupled to a network in a manner al- 
lowing the apparatus to receive client DNS requests pre- 
senting the domain name in the non-DNS encoding 
type. Further, the one or more network interfaces should 
be coupled to the network in a manner allowing the ap- 
paratus to transmit a DNS request to a standard DNS 
server, with the DNS request presenting the domain 
name in the DNS encoding type. 
[0021] The apparatus preferably also includes a map- 
ping table (possibly like one of those described above) 
residing, at least in part, on the memory. Further, at least 
one processor should be configured or designed to iden- 
tify the non-DNS encoding type of the domain name pri- 
or to converting that domain name from the non-DNS 
encoding type to the DNS encoding type. 
[0022] These and other features and advantages of 
the present invention will be described in more detail be- 
low with reference to the drawings. 



Brief Description of the Drawings 

[0023] Figure 1 is a schematic illustration of a network 
archttetfure^luding-an-4D^ 
tween a DNS server and a client. 



[0024] Figure 2 is a process flow diagram depicting 
the resolution of a DNS request presenting a Domain 
Name in a non-DNS encoding; type, in accordance with 
one embodiment of the present invention. 
[0025] Figure 3 A is process flow diagram depicting a 
process for converting a Domain Name in a non-DNS 
encoding type to a corresponding Domain Name a DNS 
encoding type. 

[0026] Figure 3B is an illustration of the logical com- 
ponents of an iDNS system. 

[0027] Figure 4 is a process flow diagram depicting a 
process for determining the encoding type of a Domain 



Name. 

[0028] Figure 5 is an illustration of a logical mapping 
table used to identify encoding types of domain names 
in accordance with one embodiment of this invention. 
s [0029] Figure 6 is a 'tree" diagram depicting a hierar- 
chy of Chinese language encodings. 
[0030] Figure 7 is a block diagram of a general-pur- 
pose computer system that may be employed to imple- 
ment iDNS functions of the present invention. 

10 

Detailed Description of the Preferred Embodiments 

1. DNS AND UNICODE 

is [0031] The present invention trarisforms multilingual 
multiscript names to a form that is compliant with DNS 
(e.g., DNS as explained in RFC 1035 as of 1 999). These 
transformed names may then be relayed as DNS que- 
ries to a conventional DNS server. An exemplary proc- 

20 ess of how a localized domain name is resolved to its 
numeric IP address is illustrated by Figure 1 below. 
However, before Figure 1 is described, a few underlying 
principles and terms will be discussed. 
[0032] Programs rarely refer to hosts, and other re- 

2S sources by their binary network addresses. Instead of 
binary numbers, they use ASCII strings, such as www. 
pobox.org.sg. Nevertheless, the network itself only un- 
derstands binary addresses, so some mechanism is re- 
quired to convert the ASCII strings to network address- 

30 es. This mechanism is provided by the Domain Name 
System. 

[0033] The essence of DNS is a hierarchical, domain- 
based naming scheme and a distributed database sys- 
tem for implementing this naming scheme. It is primarily 

35 used for mapping host names and e-mail destinations 
to IP addresses, but can be used for other purposes. As 
mentioned, DNS is defined in RFCs 1034 and 1035. 
[0034] Very briefly, the way DNS is used is as follows. 
To map a name onto an IP address, an application pro- 

40 gram calls a library procedure called the 'resolve r* 
passing it the name as a parameter. The resolver sends 
a UDP packet to a local DNS server, which then looks 
up the name and returns the IP address to the resolver, 
which then returns it to the caller. With the IP address 
-45 in hand, thn program can e&tabiisrxajrCJgj^ppnection 
with the destination or send it UDP packets. 
[0035] Conceptually, the Internet is divided into many 
top-level "domains," for each domain covers many 
hosts. Each domain Is partitioned into sub-domains and 

50 these are further partitioned, and so on. All these do- 
mains can be represented by a tree. The leaves of the 
tree represent domains that have no sub-domains (but 
do contain machines, of course). A leaf domain may 
contain a single host, or it may represent a company 

56 that contains thousands of hosts. 

[0036] The top-level domains come in two flavors: ge- 
neric and countries. The generic domains are com 
(commercial), edu (educational institutions), gov (the 
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united stales federal government), int (certain interna- 
tional organizations), mil (the united states armed forc- 
es), net (network providers), and org (organizations). 
The country domains include one entry for every coun- 
try, as defined in IS03166. Each domain is named by 
the path upward from it to the unnamed root The com- 
ponents are separated by periods (pronounced "dot"). 
[0037] In principal, domains can be inserted into the 
tree in two different ways. For example, cs.ucb.edu 
could equally well be listed under the us. Country do- 
main as cs.ucb.ctus. In practice, however, nearly all or- 
ganizations in the United States are under a generic do- 
main, and nearly all outside the United States are under 
the domain of their country. There is no rule against reg- 
istering under two top-level domains, but doing so might 
be confusing, so few organizations do it 
[0038] Each domain controls how it allocates the do- 
mains under it. For example, Japan has domains acjp 
and co.jp that mirror edu and com. To create a new do- 
main, permission is required of the domain in which it 
will be included. For example, if an artificial intelligence 
group is started at the University of California at Berke- 
ley and wants to be known as a/, cs. ucb.edu it n eeds per- 
mtssion from whomever manages cs.ucb.edu. Similarly, 
if a new university is chartered, say, the University of 
Lake Tahoe, it must ask the manager of the edu domain 
to assign it uttttedu. In this way, name conflicts are 
avoided and each domain can keep track of all its sub- 
domains. Once a new domain has been created and 
registered, it can create its own sub-domain, such as cs. 
ufth.edu, without getting permission from any entity 
higher up in the tree. 

[0039] In theory, at least, a single name server could 
contain the entire DNS database and respond to all que- 
ries about it In practice, this server would be so over- 
loaded as to be useless. Furthermore, if It ever went 
down, the entire Internet would be crippled. To avoid the 
problems associated with having only a single source of 
information, the DNS name space is divided into non- 
overlapping ■zones. 1 Each zone contains some part of 
the tree and also contains name servers holding the au- 
thoritative information about that zone. Normally, a zone 
will have one primary name server,; which gets, its infor- 
mation from a file on its diskf and one or more secondary 
name servers, which get their information from the pri- 
mary name server. 

[0040] When a resofver gets a query about a domain 
name, it passes the query to one of the local name serv- 
ers. If the domain being sought falls under the jurisdic- 
tion of the name server, such as a/, cs.ueb.ecte/ falling 
under cs.ucb.edu, it returns the authoritative resource 
records. An authoritative record is one that comes from 
the authority that manages the record, and is thus al- 
ways correct A given name server may also contain 
"cached records," which may be out of date. 
[0041] If the domain of interest is remote and no in- 
formation about the requested domain is available lo- 
cally, the name server sends a query message to the 



top-level name server for the domain requested. For ex- 
ample, a local name server seeking to find the IP ad- 
dress for ai.cs.ucb.edu may send a UDP packet to the 
server for edit given in its database, edu-server.net It is 
5 unlikely that this server knows the address of a/.cs. ucb. 
edu, and probably does not know cs.ucb.edu either, but 
it must know ail of its own children, so it forwards the 
request to the name server for uch.edu. In turn, this one 
forwards the request to cs.ucb.edu that must have the 
io authoritative resource records. Since each request is 
from a client to a server, the authoritative record request- 
ed works its way back to the original name server re- 
questing the IP address for aLcs.ucb.edu. 
[0042] Once the record gets back to the original name 
is server, it will be entered into a cache there, in case it is 
needed later. However, this information is not authorita- 
tive, since changes made at cs.usb. edu will not be prop- 
agated to all the caches in the world that may know 
about it For this reason, a cache entry should be re- 
20 moved or updated frequently. This may be accom- 
plished with a timejojive" field included in each 
record. 

[0043] The above example of a method for resolving 
a domain name is referred to as recursive querying. Oth- 
25 er techniques exist. For more detail on DNS, see An- 
drew S. Tanenbaum. •Computer Networks/ 3 rd Ed., 
Prentice Hall, Upper Saddle River, NJ (1996) from which 
much of the above discussion was adapted. See also 
U.D. Black, TCP/IP and Related Protocols." 3 rd Ed, 
so McGraw-Hill, San Francisco, CA (1998). Both of these 
references are incorporated herein by reference for all 
purposes. 

[0044] As noted, the DNS protocol is currently based 
upon a subset of ASCII, and is thus limited to the Latin 
& alphabet Numerous other encodings provide digital 
representations for other character sets of the world. Ex- 
amples include BIGS and GB-231 2 for Chinese charac- 
ter scripts (traditional and simplified respectively). Shift- 
JIS and EUC-JP for Japanese character scripts, KSC- 
40 5601 for Korean character scripts, and the extended AS- 
CII characters for French and German characters, for 
instance. 

, [0045] Beyond . these language-specific encoding 
types, there exists the Unicode standard (a "universal 
45 linguistic encoding type") that provides the capacity to 
encode all the characters used in the written languages 
of the world. It uses a 1 6-bit encoding that provides code 
points for more than 65,000 characters. Unicode scripts 
include Lath, Greek, Cyrillic, Armenian, Hebrew, Ara- 
50 bic, Devanagari, Bengali, Gurmukhi, Gujarati. Oriya, 
Tamil, Telugu, Kannada, Malayalam, Thia, Lao, Geor- 
gian, Tibetan, Japanese Kana, the complete set of mod- 
em Korean Hangul, and a unified set of Chinese/Japa- 
nese/Korean (CJK) ideographs. Many more scripts and 
55 characters are to be added shortly, including Ethbpic, 
Canadian, Syllables, Cherokee, additional rare ideo- 
graphs, Sinhata, Syriac, Burmese, Khmer, and Braille. 
[0046] A single 16-bit number is assigned to each 



5 



9 



EP 1 059 789 A2 



10 



code element defined by the Unicode Standard. Each 
of these 1 6-bit n umbers is called a code value and, when 
referred to in text, is listed in hexadecimal form following 
the prefix "U\ For example, the code value U-+0041 is 
the hexadecimal number 0041 (equal to the decimal 
number 65). It represents the character "A* in the Uni- 
code Standard. 

[0047] Each character is also assigned a unique 
name that specifies it and no other. For example, 
U+O041 is assigned the character name 'LATIN CAPI- 
TAL LETTER A.' U+0A1B is assigned the character 
name" GURMUKHI LETTER CHA.' These Unicode 
names are identical to the ISO/lEC 10646 names for the 
same characters. 

[0048] The Unicode Standard groups characters to- 
gether by scripts in code blocks. A script is any system 
of related characters. The standard retains the order of 
characters in a source set where possible. When the 
characters of a script are traditionalty arranged in a cer- 
tain order - alphabetic order, for example - the Unicode 
Standard arranges them in its code space using the 
same order whenever possible. Code blocks vary great- 
ly in size. For example, the Cyrillic code block does not 
exceed 256 code values, while the CJK code block has 
a range of thousands of code values, 
[0049] Code elements are grouped logically through- 
out the range of code values, called the "code space." 
The coding starts at U+0000 with the standard ASCII 
characters, and continues with Greek, Cyrillic, Hebrew, 
Arabic, Indie and other scripts; then followed by symbols 
and punctuation. The code space continues with Hira- 
gana. Katakana. and Bopomofo. The unified Han ideo- 
graphs are followed by the complete set of modem 
Hangul. The surrogate range of code values is reserved 
for future expansion with UTF-16. Towards the end of 
the codespace is a range of code values reserved for 
private use, followed by a range of compatibility charac- 
ters. The compatibility characters are character variants 
that are encoded only to enable transcoding to earlier 
standards and old implementations which made use of 
them. 

[0050] Character encoding standards define not only 
the identity of each character and its numeric value, or 
code position, but also how this value is represented in 
Ji>its_Iha-Unicode Standard eMoises_atJe£sljhree_ 



erence is incorporated herein by reference in its entirety 
and for alt purposes. 

[0052] The second transformation format is known as 
UTF-8. This is a way of transforming all Unicode char- 

5 acters into a variable length encoding of bytes. It has 
the advantages that the Unicode characters corre- 
sponding to the familiar ASCII set end up having the 
same byte values as ASCII, and that Unicode charac- 
ters transformed into UTF-8 can be used with much ex- 

to isting software without extensive software rewrites. The 
Unicode Consortium also endorses the use of UTF-8 as 
a way of implementing the Unicode Standard. Any Uni- 
code character expressed in the 1 6-bit UTF-1 6 form can 
be converted to the UTF-8 form and back without loss 

is of information. The Unicode Standard specifies unam- 
biguous requirements for conformance in terms of the 
principles and encoding architecture it embodies. A con- 
forming implementation has the following characteris- 
tics, as a minimum requirement: 

20 

characters are 16-bit units; 

characters are interpreted with Unicode semantics; 

25 unassigned codes are not used; and, 

unknown characters are not corrupted. 

[0053] UTF-8 implementations of the Unicode Stand- 
ee ard are conformant as long as they treat each UTF-8 
encoding of a Unicode character (sequence of bytes) as 
if it were the corresponding 16-bit unit and otherwise in- 
terpret characters according to the Unicode specifica- 
tion. The full conformance requirements are available 
55 within The Unicode Standard, Version 2.0, Addison 
Wesley Longman, 1996, previously incorporated by ref- 
erence. UTF-7 is designed to provide 7 bit characters 
that are useful for 7 bit media/transport. Email as spec- 
ified in RFC 822, for example, is a 7 bit system. UTF-16 
40 is designed for 1 6 bit mediaAransport and UTF-8 is de- 
signed for 8 bit media/transport. Most of the Internet is 
8 bit transportable, but there are legacy systems using 
7 bits (e.g., DNS, SMTP email, etc.). 

46 2. TERMINOLOGY - 



forms that correspond to ISO 10646 transformation for- 
mats, UTF-7, UTF-8 and UTF-16. 
[0051] The ISO/lEC 10646 transformation formats 
UTF-7, UTF-8 and UTF-16 are essentially ways of turn- 
ing the encoding into the actual bits that are used in im- 
plementation. UTF-16 assumes 16-bit characters and 
allows for a certain range of characters to be used as 
an extension mechanism in order to access an addition- 
al million characters using 16-bit character pairs. The 
Unicode Standard, Version 2.0, Addison Wesley Long- 
man (1 996) (with updates and additions added via The 
Unicode Standard, Version 2.1) has adopted this trans- 
formation format as defined in ISO/lEC 10646. This ref- 



[0054] Some of the terms used herein are not com- 
monly used in the art. Other terms have multiple mean- 
ings in the art. Therefore, the following definitions are 

so provided as an aide to understanding the description 
that follows. The invention as set forth en the claims 
should not necessarily be limited to these definitions. 
[0055] Linguistic encoding type - any character or 
glyph encoding type (e.g., ASCII or B1G5) now known 

55 or used in the future. 

[0056] Universal linguistic encoding type -any linguis- 
tic encoding type, now known or developed in the future, 
that encompasses more than one character or glyph set 
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within its encoding. Unicode ts one example. B1G5, iso- 
8859-11, and GB-2312 are others. 
[0057] Digitally represented - the way characters a re 
presented as a result of encoding (e.g., in a bit stream, 
a hexadecimal format, etc.) 

[0058] Digital sequence - a particular sequence of 
ones and zeros, hexadecimaJ characters, or other con- 
stituents in a digital representation. 
[0059] 'Portion' of a digitally represented domain 
name - any section or a whole of a domain name; e.g., 
the top-level domain, the second level domain, and the 
top and second level domain together. 
[0060] 'Known 9 digital sequence - a digital sequence 
of interest because it is known to be associated with 
some commonly used character combination (or other 
property of domain names) encoded in a particular en- 
coding type (e.g., the BIGS digital sequence for '.com'). 
[0061] 'Collection' of known digital sequences - any 
arrangement of or connection between multiple known 
digital sequences. Typically, though not necessarily, 
stored together logically as a table (e.g., a "mapping ta- 
ble" descrtoed herein). 

[0062] DNS encoding type - an encoding type sup- 
ported by the DNS protocol of a network or Internet, e. 
g., a limited set of ASCII specified in RFC 1035. 
[0063] Non-DNS encoding type - an encoding type 
not supported by the DNS protocol under consideration, 
e.g., BIG5 under RFC 1035. 

3. IMPLEMENTATIONS OF iDNS 

[0054] Turning now to Figure 1 , some Important com- 
ponents of a network 10 used in an embodiment of this 
invention include a client 12, a corresponding node 14 
with whom client 12 wishes to communicate, an iDNS 
server 16 and a conventional DNS server 18. The IDNS 
server 1 6 may listen on a DNS port (currentty addressed 
to the domain name port 53) for multilingual domain 
name queries in place of a normal DNS server, which 
may include the Berkeley Internet Name Domain (BIND 1 
and its executable version Viamed') which is a widely 
used DNS server written by Paul Vixie (http://www.isc. 
°rg/). 

[0065] To understand the role of these components, 
assume that client 1 2 is used by a Chinese student who 
wishes to inquire about employment in a Hong Kong 
business that operates corresponding node 1 4. The stu- 
dent has previously communicated with the business 
and has obtained the domain name of that business. 
The domain name is provided in native Chinese char- 
acters. Client 12 is outfitted with a keyboard that can 
type Chinese language characters and is configured 
with software that can recognize encoded Chinese char- 
acters and accurately display them on a computer 
screen. 

[0066] Now, the student prepares a message to the 
Hong Kong business, encloses her resume, and types 
in the Chinese domain name as the destination. When 



she instructs client 12 to send the message to corre- 
sponding node 14, the system shown in Figure I takes 
the following actions. First, the corresponding node do- 
main name is submitted, in the native language, to iDNS 

5 server 16 via a DNS request The iDNS server 16 rec- 
ognizes that the domain name is not in a format that can 
he handled by a conventional DNS server. Therefore it 
translates the Chinese domain name to a format that 
can be used with a conventional DNS server (normally 

io a limited set of the ASCII characters). The iDNS server 
16 then repackages the DNS request, with the translat- 
ed corresponding node domain name, and transmits 
that request to conventional DNS server 18. DNS server 
18 then uses the normal DNS protocol to obtain a net- 

15 work address for the domain name it received in the 
DNS request The resulting network address is the net- 
work address of corresponding node 14. DNS server 18 
packages that network address according to conven- 
tional DNS protocol and forwards the address back to 

20 iDNS server 16. The iDNS server 16, in turn, transmits 
the needed network address back to client 12, where it 
is placed in the student's message. The message is 
packetlzed, with each packet having a destination net- 
work address corresponding to node 14. Client 12 then 

2S sends the message packets over the Internet to node 
14. 

[0067] This procedure can be understood more fully 
by considering the operations described in the interac- 
tion process flow diagram of Figure 2. As shown there, 
30 client 12 is depicted by a vertical line on the left-hand 
side of the figure, iDNS server 16 is depicted by a ver- 
tical line in the center of the figure, and DNS server 1 9 
is depicted by a vertical line on the right-hand side of 
the figure. 

35 [0063] Initially, at 203, an application running on client 
1 2 generates a message intended for a network desti- 
nation. The domain name for that destination is input in 
non-DNS compatible text encoding format. Thus, the 
text is encoded in a linguistic encoding type that digitally 

to represents the characters of the text. As mentioned, AS- 
CII is but one linguistic encoding type. In preferred em- 
bodiments, the invention handles a wide range of en- 
coding types. Examples of some in wide use include 
GB2312, BIGS, Shitt^JIS, EUC^JP, KSC5601 , extended 
.45 ASCII, and others. 

[0069] After the client application creates the mes- 
sage at 203, the client operating system creates a DNS 
request to resolve the domain name at 205. The DNS 
request may resemble a conventional DNS request in 

so most regards. However, the domain name provided in 
the request will be provided in a non-DNS encoding for- 
mat. The client operating system transmits its DNS re- 
quest to iDNS server 16 at 207. Note that the client op- 
erating system may be configured to send DNS re- 

ss quests to iDNS server 16. In other words, the default 
DNS server of client 12 is iDNS server 16. 
[0070] The iDNS server 1 6 extracts the encoded do- 
main name from the DNS request and generates a 



7 



13 



EP 1 059 789 A2 



14 



transformed DNS request presenting the domain name 
in a DNS compatible encoding format (presently the re- 
duced set ASCII specified in RFC 1035). See 209. The 
iDNS server 16 then transmits its DNS request to con- 
ventional DNS name server 18. See 211. The name 
server then uses a conventional DNS protocol to obtain 
the IP address of the domain name used in the client's 
communication. See 21 3. Then, at 215, the name server 
replies to the iDNS server with the requested IP ad- 
dress. The iDNS server 16 then transmits the IP address 
backtoclient 12 at217. Finally, client 1 2, with IPaddress 
now in hand, sends its communication to the intended 
destination. See 21 9. 

[0071] As indicated above, the domain name must, at 
some point, be converted from a non-DNS encoding 
type to a DNS compatible encoding type. In the above 
examples, this is accomplished with a proxy iDNS serv- 
er. This need not be the case, however, as the function- 
ality necessary for conversion may be embodied in the 
client or the conventional DNS server, as well 
[0072] in alternative embodiments, the functions per- 
formed by the proxy iDNS server are implemented in 
whole (or in part) on the client anoVor on the DNS server. 
In one embodiment, operations including detecting an 
encoding type, translating a non-DNS encoded domain 
to a DNS encoded domain name and identifying a de- 
fault name server (operations 305-311 of the Figure 3A 
flow chart discussed below) are implemented on an In- 
ternet application (e.g., a multilingual-enabled Web 
browser). In this embodiment, code detection and code 
conversion are automatically done prior to dispatching 
a DNS resolution request to a DNS server. In some em- 
bodiments, the application can provide manually de- 
fined linguistic encoding which obviates the need for 
code detection. 

[0073] I n another alternative embodiment, operations 
305-311 can be implemented on the iDNS server. Other 
embodiments include collapsing aJI or some fraction of 
the operations of the proxy iDNS into the DNS server. 
For example, code for some iDNS functions can be col- 
lapsed into BIND code as a compilable module. 
[0074] In Figure 2, the conversion of the domain name 
from one linguistic encoding type to a second linguistic 
encoding type (compatible with DNS) is performed at 
209. As shown in Figure 3A. in accordance with a pre- 



[0075] In the interesting case, the domain name is en- 
coded in a non-DNS format. When this occurs, process 
control is directed to 307 where the system translates 
the domain name to a universal encoding type. In a pre- 
s ferred embodiment, this universal encoding type is Uni- 
code. In this case, the characters identified in the native 
encoding type are identified in the Unicode standard and 
converted to the Unicode digital sequences for those 
characters. 

io [0076] The newly translated domain name is then fur- 
ther transformed from the universal encoding type to a 
DNS compatible encoding type. See 309. Thus, this final 
encoding type may be reduced set ASCII. Note that the 
translation from the DNS incompatible format to the 

is DNS compatible format takes place in two steps through 
an intermediate universal encoding type. This two step 
procedure will be detailed below. It should be under- 
stood, however, that it may be possible to directly con- 
vert, in one step, the DNS incompatible domain name 

20 to the DNS compatible domain name. This may be ac- 
complished in a system having multiple conversion al- 
gorithms, each designed to convert a specific encoding 
type to ASCII (or some other future DNS-compatible en- 
coding type). In one example, these algorithms may be 

ss modeled after the "Durst algorithm" described above. 
Many other suitable algorithms are known or can be de- 
veloped with routine effort 

[0077] With a DNS compatible domain name now in 
hand, the system need only determine which conven* 

30 tional DNS name server it should forward the domain 
name to. According to normal DNS protocol, the DNS 
request might be forwarded to a top-level name server. 
As will be described in more detail below, ft may be con- 
venient to have different root name servers handle dif- 

35 ferent linguistic domains. For example, the Chinese 
government may maintain a root name server for Chi- 
nese language domain names, the Japanese govern- 
ment or a Japanese corporation may maintain a root 
name server for Japanese language domain names, the 

40 Indian government may maintain a root name server for 
Hindi language domain names, etc. In any event, the 
system must identify the appropriate name server at 311 
as indicated in Figure 3A. After this has been accom- 
plished, the conversion process is complete and the 

45 dns req u es t can be transmitted to the PN5 system for 



ferred embodiment of this invention, this conversion 
may take place via a process 301 . The process begins 
at 303 with the system identifying the encoding type of 
the domain name in the DNS request. This is necessary 
when the system may be confronted with multiple differ- 
ent encoding types. After the encoding type has been 
identified, the system next determines whether the do- 
main name was encoded in a DNS compatible encoding 
type at 305. Currently, that requires determining wheth- 
er the domain name is encoded in the reduced set ASCII 
encoding type. If so, further conversion is unnecessary 
and process control is directed to 31 1 , which will be de- 
scribed below. 



handling according to convention. 
[00781 Preferably, the process depicted in Figure 3A 
is performed solely on an iDNS server. However, some 
of the process may be performed on a client or a con- 

50 ventlonal DNS server For example, 303 and 305 could 
be performed on a client and 309 could be performed 
on a conventional DNS server. 
[0079] A preferred division of labor for the iDNS func- 
tion (327) is depicted in Figure 3B. As shown there, an 

55 iDNS mapper server 321 performs operations 305-311. 
To this end, it includes a mapping table (an example of 
which is described below with reference to Figure 5) and 
can convert all linguistic encoding types to Unicode (or 
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other suitable universal encoding type). In this embodi- 
ment, a client 325 performs operation 303 and a con- 
ventional DNS server 323 performs the standard DNS 
resolving protocol. 

[0080] In one implementation iDNS mapper server 
321 runs on a machine (identified by i2.Wns.com for 
example) on a designated port (e.g., a port number 
2000). It accepts a whole portion of a digitally represent- 
ed domain name in any linguistic encoding type and re- 
turns a whole portion of a digitally represented domain 
name in Unicode transformed to a DNS encoding type 
(UTF-5). Note that the mapping table and the conver- 
sion program code may be quite large, thereby Increas- 
ing the size of DNS server 323 several fold (if imple- 
mented there). By separating operations 305-311 from 
the DNS protocol and running it separately, the amount 
of code needed to distribute iDNS is reduced. 
[0081] As indicated in the discussion of Figure 3A, 
when the system must handle multiple encoding types, 
It must be capable of distinguishing one encoding type 
from the next. This process was depicted at block 303 
and is elaborated on in Figure 4. 
[0082] As shown in Figure 4, the process of identifying 
an encoding type 401 begins at 403 with the system 
identifying the digital sequence of the top-level domain 
of the domain name. In the system In place in March 
1999, the top-level domains included .com, .ecfu, .gov, . 
mil, .org, Jnt, .net, and the various two letter country 
designations (e.g., At, ,sg, .kr, etc.). 
[0033] After the digital sequence of the top-level do- 
main has been identified, the system next matches that 
sequence to a particular encoding type, in a preferred 
embodiment, this involves matching the sequence 
against records in a mapping table at 405. An exemplary 
mapping table will be described in more detail below. 
For now, simply recognize that the table (or other logical 
structure) includes a list of digital sequences for various 
top-level domains in the various linguistic encoding 
types handled by the system. Each separate record also 
includes an associated encoding type identifier. The 
system matches the digital sequence under considera- 
tion by simply comparing it against the sequences in the 
various records of the' mapping table (using a standard 
database look up procedure such as a binary search, 
hash table; B4ree, etc.). This will typically provide a sin- 
gle match. However, if multiple entities are responsible 
for issuing top-level domains (each responsible for a dif- 
ferent language, for example), then it is possible that the 
digital sequences for two top-level domains in different 
encoding formats could be identical. 
[0084] To address this possibility, the system deter- 
mines, at 407, whether multiple records match the digital 
sequence under consideration. If not the process is 
complete at 41 3 with the system deciding to use the en- 
coding identified in the single matching record. If, on the 
other hand, two or more records match, the system must 
resolve this ambiguity. It does this by first identifying a 
lower-level domain (e.g., a subdomain such as a second 



level domari) digital sequence. See 409. In other words, 
the domain name under consideration will have a digital 
sequence associated with its lower level domains. The 
now expanded digital sequence is again matched 

s against the digital sequences in the mapping table 
(405). Note that some records of the table may include 
digital sequences for the combination of top-level and 
lower level domains (to resolve a potential ambiguity in 
the sequences of the top-level domains). After a match 

10 is found at 405, the process proceeds through 407 as 
described above. 

[0085] In an alternative embodiment, only the digital 
sequences for top-level domains are maintained in the 
mapping table. No provision is made for extended se- 

15 quences to resolve ambiguities. In this case, when 407 
Is answered in the affirmative (multiple records do 
match), the system identifies each of the potential 
matches (candidate encoding types). The sequence un- 
der consideration Is then decoded using each of the po- 

20 tential encoding types. For example, the root domain 
digital sequence may have found a match for, net in one 
of the Japanese encoding types and .com in one of the 
Chinese encoding types. 

[0086] One of the decoded strings should be under- 

25 standabls in the language of the candidate encoding 
type. The other(s) should be gibberish. Thus, the system 
selects the candidate encoding type providing the best 
decoding of the secondary domain. The process is then 
concluded at 41 3 with the system using the selected en- 

30 coding type. 

[0087] As indicated at 405 in the discussion of Figure 
4, the iDNS server may match a digital sequence for a 
top-level domain of a domain name query against 
known digital sequences for multiple encoding types. A 

35 mapping table may house the known digital sequences. 
Figure 5 provides a mapping table 501 in accordance 
with one embodiment of this invention. Each record in 
table 501 specifies a minimum code resolving string (e. 
g., a top-level domain) for a particular encoding type (a 

40 g„, .com for BIG5). 

[0088] As shown, mapping table 501 includes six sep- 
arate fields. The first of these is a time to live that spec- 
ifies how long before the entry cache expires. Next, a 
minimum code resolving string field identifies the digital 

4$ sequence of a portion of a domain name (e.g., the digital 
encoding for .com in BIG5). Note that the minimum code 
resolving string is typically provided as an 8 bit binary 
string. To simplify entry and maintenance of minimum 
code resolving strings In table 501, a transformation 

50 rnay be applied to the binary string in order to get the 
form shown. 

[0089] While the minimum code resolving string may 
often be the top-level domain, this need not be the case. 
For some linguistic encodings, it may be necessary to 
55 include the second or a higher level domain to uniquely 
resolve the type of encoding given in the string because 
of an ambiguity. Similarly, it may not always be neces- 
sary to use the whole top-level domain to uniquely de- 
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termine the encoding type. This speeds the search for 
a match. 

[0090] The "authority" specified in the table is the en- 
tity given authority over domain names specified in the 
record. This authority can register sub-domains under 
its authority. For example, if an "i-dns" entity is given au- 
thority over .com in BIG5, it may have authority to issue 
att sub-domain names under .com in BIGS. This en- 
sures that only unique domain names are assigned. Al- 
so, the authority denotes an entity having dominion over 
a name server (or servers) with •authoritative" records 
that provide IP addresses for domain names in the au- . 
thorit/s portion of DNS space. The "encoding" field ta- 
ble 501 specifies the encoding type of the domain name 
matching the record. The "transform" field specifies the 
final encoding of the domain name. For example, UTF- 
5 is the Durst algorithm applied to Unicode (described 
below). Finally, a "comments" field contains a text string 
identifying what the portion of a domain name corre- 
sponds to the minimum code resolving string. Figure 6 
illustrates an exemplary domari name tree for resolving 
Chinese language domain names. An iDNS server de- 
tecting a Chinese language encoding type, will be con- 
figured with default name servers for resolving a domain 
name. As shown in Figure 6, under the root there are 
multiple top-level domains (e.g., .com, edu, .sg, etc.). 
Under the .sg top-level domain, there are multiple Chi- 
nese language second- level domains such as odu.sg, 
and under that, there multiple domains including nus. 
edu.sg, and so on. Similarly, under the top-level .com, 
there are multiple second-level Chinese language sub- 
domains such as emaS.com. 

[0091] As noted in the discussion of the embodiment 
of Figure 3A, the iDNS system converts the universal 
encoding type (e.g., Unicode) of the domain name to a 
DNS encoding type. In one preferred embodiment, this 
is accomplished using a transformation algorithm de- 
fined by the Internet draft, "Internationalization of Do- 
main Names", by Martin Durst, previously incorporated 
by reference. The algorithm will transform a variable 
length data entity to a form that consists of only the RFC- 
compliant ASCII monocase alphabets and numbers, 
the table below shows the transformation table used in 
the Internet draft. • ' '■ ■ - 
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[0092] The first two columns of the table are to be in- 
terpreted as binary (or hexadecimal) values while the 
last two columns are to be interpreted as the ASCII 
RFC1035-compliant characters. 'Initial' and 'subse- 
quent' means the initial nibble (half a byte) of the data 
entity and the rest of the data entity respectively. If the 
data entity is 2 bytes long (as in the case of UCS-2), 
then there will be 4 nibbles in that particular data entity. 
[0093] As indicated in the above discussion, to re- 
solve a multilingual domain name, a client application 
will submit the multilingual non-RFC-compliant query to 
an iDNS proxy server. This proxy server will then trans- 
form the query to an RFC-compliant format using this 
transformation algorithm and submit this query to a DNS 
server. 

[0094] At the DNS server, there will be an entry for 
this RFC-compliant query that maps to a valid IP ad- 
dress such as : 

U4B807E7RBB4U7BDPI . 
U696R0E5OAA0U59DQ1 IN A 12-34.56.78 
[0095] The DNS server will then return this IPaddress 
in accordance to RFC1035 to the iDNS proxy server. 
The proxy will then relay the message containing the 
correctly resolved IP address to the client Note that the 
transformed domain name (in ASCII) normally will have 
to be registered with the authority responsible for con- 
trolling and issuing conventional DNS domain names. 
[0096] Embodiments of the present invention relate to 
an apparatus for performing the above^iescribed JONS . 
operations. This apparatus may be specially construct- 
ed (designed) for the required purposes, or it may be a 
general-purpose computer selectively activated or 
reconfigured by a computer program stored in the com- 
puter. The processes presented herein are not inherent- 
ly related to any particular computer or other apparatus. 
In particular, various general-purpose machines may be 
used with programs written in accordance with the 
teachings herein, or it may be more convenient to con- 
struct a more specialized apparatus to perform the re- 
quired method steps. The required structure for a variety 
of these machines will appearfrom the description given 
above. 

[0097] In addition, embodiments of the present inven- 
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tion further relate to computer readable media that in- 
clude program instructions for performing various com- 
puter-implemented operations. The media may also in- 
clude, alone or in combination with the program instruc- 
tions, data files, data structures, tables, and the like. The & 
media and program instructions may be those specially 
designed and constructed for the purposes of the 
present invention, or they may be of the kind well known 
and available to those having skill In the computer soft- 
ware arts. Examples of computer-readable media in- to 
elude magnetic media such as hard disks, floppy disks, 
and magnetic tape; optical media such as CD-ROM 
disks; magneto-optical media such as floptical disks; 
and hardware devices that are specially configured to 
store and perform program instructions, such as read- 15 
only memory devices (ROM) and random access mem- 
ory (RAM). The media may also be a transmission me- 
dium such as optical or metallic lines, wave guides, etc. 
including a carrier wave transmitting signals specifying 
the program instructions, data structures, etc. Examples 20 
of program instructions include both machine code, 
such as produced by a compiler, and files containing 
higher level code that may be executed by the computer 
using an interpreter. 

[0098] Figure 7 illustrates a typical computer system 2s 
in accordance with an embodiment o1 the present inven- 
tion. The computer system 700 includes any number of 
processors 702 (also referred to as central processing 
units, or CPUs) that are coupled to storage devices in- 
cluding primary storage 706 (typically a random access & 
memory, or "RAM"), primary storage 704 (typically a 
read only memory, or 'ROM*). As is well known in the 
art, primary storage 704 acts to transfer data and in- 
structions uni-directionally to the CPU and primary stor- 
age 706 is used typical ly to transfer data and instruc- ss 
tions in a bidirectional manner. Both of these primary 
storage devices may include any suitable type of the 
computer-readable media described above. A mass 
storage device 70S is also coupled bi-directionally to 
CPU 702 and provides additional data storage capac ity *o 
and may include, any of the computer-readable media 
described above. The mass storage device 708 may be 
used to store programs, data and the fike and is typically 
a secondary storage medium such as a hard disk that 
is slower than primary storage. It will be appreciated that 45 
the information retained within the mass storage device 
708, may, in appropriate cases, be incorporated in 
standard fashion as part of primary storage 706 as vir- 
tual memory. A specific mass storage device such as a 
CD-ROM 71 4 may also pass data uni-directionally to the so 
CPU. 

[0099] CPU 702 is also coupled to an interface 710 
that includes one or more input/output devices such as 
such as video monitors, track balls, mice, keyboards, 
microphones, touch-sensitive displays, transducer card ss 
readers, magnetic or paper tape readers, tablets, sty- 
luses, voice or handwriting recognizers, or other well- 
known input devices such as, of course, other comput- 



ers. Finally, CPU 702 optionally may be coupled to a 
computer or telecommunications network using a net- 
work connection as shown generally at 71 2. With such 
a network connection, it is contemplated that the CPU 
might receive information from the network, or might 
output information to the network in the course of per- 
forming the above-described method steps. The above- 
described devices and materials will be familiar to those 
of skill in the computer hardware and software arts. 
[0100] The hardware elements described above may 
be configured (usually temporarily) to act as one or more 
software modules for performing the operations of this 
invention. For example, instructions for detecting an en- 
coding type, transforming that encoding type, and iden- 
tifying a default name server may be stored on mass 
storage device 708 or 71 4 and executed on CPU 708 in 
conjunction with primary memory 706. 
[0101] Although the foregoing invention has been de- 
scribed in some detail for purposes of clarity of under- 
standing, it will be apparent that certain changes and 
modifications may be practiced within the scope of the 
appended claims. 



Claims 

1 . A method, implemented on an apparatus, of detect- 
ing the linguistic encoding type of a digitally repre- 
sented domain name, the method comprising: 

receiving the digital sequence of a prespecified 
portion of the digitally represented domain 
name; 

matching said digital sequence from the do- 
main name with a known digital sequence from 
a collection of known digital sequences, each 
associated with a particular linguistic encoding 
type, and the collection including known digital 
sequences for at least two different linguistic 
encoding types; and 

identifying an encoding type associated with 
the known digital sequence matching the digital 
sequence from the domain name. 

2. The method of claim 1 , further comprising receiving 
a DNS request containing the digitally represented 
domain name. 

3. The method of claim 1 or 2, wherein the prespeci- 
fied portion of the digitally represented domain 
name is a minimum code resolving string in the do- 
main name. 

4. The method of claim 1 , 2, or 3, further comprising 
transforming the format of the digital sequence of 
the digitally represented domain name prior to 
matching that digital sequence. 
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5. The method of any of claims 1 -4, wherein the col- 
lection of known digital sequences is provided in a 
table containing records having attributes including 
known digital sequences and encoding types. 

6. The method of claim 5, wherein the table includes 
records having at least the following encoding 
types: ASCII, BIG5. GB2312, shift-JIS, EUC-JP, 
KSC5601, and extended ASCII. 

7. The method of claim 5. wherein identifying the en* 
coding type comprises identifying the encoding type 
of a record having the matching known digital se- 
quence. 

8. The method of any of claims 1 -7, wherein at least 
two known digital sequences match the digital se- 
quence from the domain name, and further compris- 
ing: 

receiving the digital sequence of a second por- 
tion of the digitally represented domain name; 
and 

matching the digital sequence of the second 
portion with a known digital sequence from the 
collection of known digital sequences 

9. The method of claim 2, further comprising: 

identifying a root level DNS server responsible 
for resolving root level domains of the identified 
encoding type; and 

transmitting the DNS request to the root level 
DNS server. 

10. The method of claim 9, further comprising, prior to 
transmitting the DNS request, converting the do- 
main name's digital sequence from the identified 
encoding type to a DNS encoding type compatible 
with DNS protocol. 

11. The method of claim 10, wherein the DNS encoding 
type is ASCII or a universal linguistic encoding type. 

12. The method of claim 10, wherein converting the do- 
main name's digital seauence comprises: 

converting the domain name's digital sequence 

from the identified encoding type to a universal 

linguistic encoding type; and 

converting the domain name's digital sequence 

from the universal linguistic encoding type to a 

DNS encoding type compatible with the DNS 

protocol. 

1 3. A computer program product comprising a machine 
readable medium on which is provided program in- 
structions for performing a method of detecting the 



linguistic encoding type of a digitally represented 
domain name, the method comprising; 

receiving the digital sequence of a prespecified 
5 portion of the digitally represented domain 

name; 

matching said digital sequence from the do- 
main name with a known digital sequence from 
a collection of known digital sequences, each 
io associated with a particular linguistic encoding 

type, and the collection including known digital 
sequences for at least two different linguistic 
encoding types: and 

identifying an encoding type associated with 
is the known digital sequence matching the digital 

sequence from the domain name. 

14. The computer program product of claim 13, wherein 
the collection of known digital sequences is provid- 

20 ed in a table containing records having attributes 
including known digital sequences and encoding 
types. 

15. The computer program product of claim 13 or 14, 
2$ further comprising program instructions for the fol- 
lowing: 

receiving a DNS request containing the digitally 
represented domain name; 
30 identifying a root level DNS server responsible 

for resolving root level domains of the identified 
encoding type; and 

transmitting the DNS request to the root level 
DNS server. 

35 

16. The computer program product of claim 15, further 
comprising program instructions for the following: 

prior to transmitting the DNS request, convert- 
ing the domain name's digital sequence from the 
40 identified encoding type to a DNS encoding type 
compatible with DNS protocol. 

17. On a machine-readable medium, a linguistic encod- 
ing type mapping table that associates particular lin- 

45 guistic encoding types with particular digital se- 
quences, the mapping table comprising a plurality 
of records, and the records comprising; 

a known digital sequence of a prespecified por- 
$o tion of a digitally represented domain name; 

and 

a linguistic encoding type associated with the 
known digital sequence. 

55 18. The mapping table of claim 17, wherein the pre- 
specified portion of the digitally represented domain 
name is the digital sequence of the top-level domain 
in the domain name. 
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19. The mapping table of claim 17 or 18, wherein the 
mapping table records include at least the following 
linguistic encoding types: ASCII, BIG5, GB2312, 
shift-JiS, EUC%JP f KSC5601 , and extended ASCII. 

20. The mapping table of claim 17, 18, or 19, wherein 
the records further comprise a top-level DNS server 
responsible for resolving root level domains of the 
linguistic encoding type in the record. 

21. The mapping table of any of claims 17-20, wherein 
the records further comprise a transformation for 
converting the encoding type to a DNS compliant 
encoding type. 

22. The mapping table of any of claims 17-21 . wherein 
the records further comprise a time to live field. 



28. The apparatus of any of claims 24-27, wherein at 
least one processor is configured or designed to 
identify the non-DNS encoding type of the domain 
name prior to converting that domain name from the 
non-DNS encoding type to the DNS encoding type. 



10 



15 



23. The mapping table of claim 22, wherein the DNS 
encoding type is ASCII or a universal linguistic en- 20 
coding type. 



24. An apparatus comprising: 



one or more processors; 25 
memory coupled to at least one of the one or 
more processors; and 

one or more network interfaces capable of re- 
ceiving a first DNS request including a domain 
name in a non-DNS encoding type and trans- 30 
mitting a DNS request with the domain name in 
a DNS encoding type thai is compatible with the 
DNS protocol, wherein at least one of the one 
or more processors is designed or configured 
toconyert the domain name in the non-DNS en- 3S 
coding type to that domain name in the DNS 
encoding type. 



25. The apparatus of claim 24, wherein the one or more 
network interfaces are coupled to a network in a 40 
manner allowing the apparatus to receive client 
DNS requests, wherein the client DNS requests 
present the domain name in the non-DNS encoding 
type. 

45 

26. The apparatus of claim 24 or 25, wherein the one 
or more network interfaces are coupled to a network 
in a manner allowing the apparatus to transmit a 
DNS request to a standard DNS server, wherein the 
DNS request presents the domain name in the DNS so 
encoding type, 

27. The apparatus of claim 24, 25, or 25, further com- 
prising a mapping table residing, at least in part, on 
the memory, wherein the mapping table associates 55 
particular linguistic encoding types with particular 
digital sequences expected to be found in digitally 
encoded domain names. 
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